Re: How can I prevent duplicate inserts by multiple concurrent threads ?

Поиск

Список

Период

Сортировка

От	Daniel Staal
Тема	Re: How can I prevent duplicate inserts by multiple concurrent threads ?
Дата	7 января 2015 г. 20:31:32
Msg-id	EC059D342FB2C94591E50167@[192.168.1.50] обсуждение
Ответ на	Re: How can I prevent duplicate inserts by multiple concurrent threads ? (Hursh Jain <hurshjain@beesell.com>)
Список	pgsql-novice

Дерево обсуждения

--As of January 7, 2015 3:01:42 PM -0500, Hursh Jain is alleged to have
said:

> Right, maybe that's what I should be doing. The reason I didn't do that
> initially was because creating new data can be *very* expensive. We have
> about a 100 different chunks and if say 10 threads hit the database and
> each thread re-creates 100 chunks, we will end up trying to save 100*10
> (1000) chunks, 900 of which will be rightly rejected because of the
> uniqueness constraint, so that time spent in creating those chunks would
> have been wasted to the annoyance of the clients.
>
> It would be nice to see if that data already exists before we try to
> re-insert it....but how ? the "select for update" doesn't seem to be
> working as I hoped..

--As for the rest, it is mine.

It won't completely solve the issue, but take a look at savepoints and
transactions.  My flow would probably be:

Start transaction
Select row to see if data exists
If so, skip
If not, create data and insert
End transaction

If you can generate the key without generating the data, I'd probably do:

Start transaction
Select row to see if data exists
If not, insert key, else end transaction
Set savepoint
Create data.
Update to key, rolling back to savepoint and restarting if there is an
issue.
End transaction

(Though the 'select row' and 'insert key' of the second could probably be
combined to just 'insert key'.)

Each transaction will get you an internally-consistent view of the
database, so the select will be on the same dataset you eventually insert
into.

My other question from your comments above is if this should all actually
be in *one* table?  If you are collecting a lot of different information
from different sources, it might work better to have tables with that data,
and just reference them with this table - which would be quicker and
cheaper to update then.  (I obviously don't know the situation - I
acknowledge that you might to be able to do that for any of a hundred
reasons, or that the gathering and correlating might be generating
something more unique to each row.  It's just a thought from the simplified
description you've given.)

Daniel T. Staal

---------------------------------------------------------------
This email copyright the author.  Unless otherwise noted, you
are expressly allowed to retransmit, quote, or otherwise use
the contents for non-commercial purposes.  This copyright will
expire 5 years after the author's death, or in 30 years,
whichever is longer, unless such a period is in excess of
local copyright law.
---------------------------------------------------------------

В списке pgsql-novice по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: How can I prevent duplicate inserts by multiple concurrent threads ?